Load packages
library(tidyr)
library(dplyr)
library(tibble)
library(pillar)
library(stringr)
library(brms)
options(brms.backend = "cmdstanr", mc.cores = 2)
library(posterior)
options(pillar.negative = FALSE)
library(loo)
library(priorsense)
library(ggplot2)
library(bayesplot)
theme_set(bayesplot::theme_default(base_family = "sans"))
library(tidybayes)
library(ggdist)
library(patchwork)
library(RColorBrewer)
SEED <- 48927 # set random seed for reproducability
This notebook contains several examples of how to use Stan in R with brms. This notebook assumes basic knowledge of Bayesian inference and MCMC. The examples are related to Bayesian data analysis course.
Toy data with sequence of failures (0) and successes (1). We would like to learn about the unknown probability of success.
data_bern <- data.frame(y = c(1, 1, 1, 0, 1, 1, 1, 0, 1, 0))
As usual in case of generalizd linear models, (GLMs) brms defines the priors on the latent model parameters. With Bernoulli the default link function is logit, and thus the prior is set on logit(theta). As there are no covariates logit(theta)=Intercept. The brms default prior for Intercept is student_t(3, 0, 2.5), but we use student_t(7, 0, 1.5) which is close to logistic distribution, and thus makes the prior near-uniform for theta. We can simulate from these priors to check the implied prior on theta. We next compare the result to using normal(0, 1) prior on logit probability. We visualize the implied priors by sampling from the priors.
data.frame(theta = plogis(ggdist::rstudent_t(n=20000, df=3, mu=0, sigma=2.5))) |>
mcmc_hist() +
xlim(c(0,1)) +
labs(title='Default brms student_t(3, 0, 2.5) prior on Intercept')
data.frame(theta = plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))) |>
mcmc_hist() +
xlim(c(0,1)) +
labs(title='student_t(7, 0, 1.5) prior on Intercept')
Almost uniform prior on theta could be obtained also with normal(0,1.5)
data.frame(theta = plogis(rnorm(n=20000, mean=0, sd=1.5))) |>
mcmc_hist() +
xlim(c(0,1)) +
labs(title='normal(0, 1.5) prior on Intercept')
Formula y ~ 1 corresponds to a model $() =
#\alpha\times 1 = \alpha$. `brms? denotes the $\alpha$ as `Intercept`.
fit_bern <- brm(y ~ 1, family = bernoulli(), data = data_bern,
prior = prior(student_t(7, 0, 1.5), class='Intercept'),
seed = SEED, refresh = 0)
Check the summary of the posterior and inference diagnostics.
fit_bern
Family: bernoulli
Links: mu = logit
Formula: y ~ 1
Data: data_bern (Number of observations: 10)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.76 0.64 -0.43 2.09 1.00 1734 1726
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Extract the posterior draws
draws <- as_draws_df(fit_bern)
We can get summary information using summarise_draws()
draws |>
subset_draws(variable='b_Intercept') |>
summarise_draws()
# A tibble: 1 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.763 0.746 0.641 0.636 -0.242 1.90 1.00 1734. 1726.
We can compute the probability of success by using plogis which is equal to inverse-logit function
draws <- draws |>
mutate_variables(theta=plogis(b_Intercept))
Summary of theta by using summarise_draws()
draws |>
subset_draws(variable='theta') |>
summarise_draws()
# A tibble: 1 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 theta 0.668 0.678 0.130 0.134 0.440 0.870 1.00 1734. 1726.
Histogram of theta
mcmc_hist(draws, pars='theta') +
xlab('theta') +
xlim(c(0,1))
Make prior sensitivity analysis by powerscaling both prior and likelihood. Focus on theta which is the quantity of interest.
theta <- draws |>
subset_draws(variable='theta')
powerscale_sensitivity(fit_bern, prediction = \(x, ...) theta, num_args=list(digits=2)
)$sensitivity |>
filter(variable=='theta') |>
mutate(across(where(is.double), ~num(.x, digits=2)))
# A tibble: 1 × 4
variable prior likelihood diagnosis
<chr> <num:.2!> <num:.2!> <chr>
1 theta 0.04 0.11 -
Instead of sequence of 0’s and 1’s, we can summarize the data with the number of trials and the number successes and use Binomial model. The prior is specified in the ‘latent space’. The actual probability of success, theta = plogis(alpha), where plogis is the inverse of the logistic function.
Binomial model with the same data and prior
data_bin <- data.frame(N = c(10), y = c(7))
Formula y | trials(N) ~ 1 corresponds to a model \(\mathrm{logit}(\theta) = \alpha\), and the number of trials for each observation is provided by | trials(N)
fit_bin <- brm(y | trials(N) ~ 1, family = binomial(), data = data_bin,
prior = prior(student_t(7, 0,1.5), class='Intercept'),
seed = SEED, refresh = 0)
Check the summary of the posterior and inference diagnostics.
fit_bin
Family: binomial
Links: mu = logit
Formula: y | trials(N) ~ 1
Data: data_bin (Number of observations: 1)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.77 0.64 -0.46 2.09 1.00 1660 1769
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The diagnostic indicates prior-data conflict, that is, both prior and likelihood are informative. If there is true strong prior information that would justify the normal(0,1) prior, then this is fine, but otherwise more thinking is required (goal is not adjust prior to remove diagnostic warnings withoyt thinking). In this toy example, we proceed with this prior.
Extract the posterior draws
draws <- as_draws_df(fit_bin)
We can get summary information using summarise_draws()
draws |>
subset_draws(variable='b_Intercept') |>
summarise_draws()
# A tibble: 1 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.767 0.758 0.636 0.622 -0.249 1.88 1.00 1660. 1769.
We can compute the probability of success by using plogis which is equal to inverse-logit function
draws <- draws |>
mutate_variables(theta=plogis(b_Intercept))
Summary of theta by using summarise_draws()
draws |>
subset_draws(variable='theta') |>
summarise_draws()
# A tibble: 1 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 theta 0.669 0.681 0.130 0.132 0.438 0.868 1.00 1660. 1769.
Histogram of theta
mcmc_hist(draws, pars='theta') +
xlab('theta') +
xlim(c(0,1))
Re-run the model with a new data dataset without recompiling
data_bin <- data.frame(N = c(5), y = c(4))
fit_bin <- update(fit_bin, newdata = data_bin)
Check the summary of the posterior and inference diagnostics.
fit_bin
Family: binomial
Links: mu = logit
Formula: y | trials(N) ~ 1
Data: data_bin (Number of observations: 1)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.04 0.92 -0.52 3.05 1.00 1374 1345
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Extract the posterior draws
draws <- as_draws_df(fit_bin)
We can get summary information using summarise_draws()
draws |>
subset_draws(variable='b_Intercept') |>
summarise_draws()
# A tibble: 1 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.04 0.961 0.917 0.858 -0.305 2.68 1.00 1374. 1345.
We can compute the probability of success by using plogis which is equal to inverse-logit function
draws <- draws |>
mutate_variables(theta=plogis(b_Intercept))
Summary of theta by using summarise_draws()
draws |>
subset_draws(variable='theta') |>
summarise_draws()
# A tibble: 1 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 theta 0.707 0.723 0.157 0.166 0.424 0.936 1.00 1374. 1345.
Histogram of theta
mcmc_hist(draws, pars='theta') +
xlab('theta') +
xlim(c(0,1))
An experiment was performed to estimate the effect of beta-blockers on mortality of cardiac patients. A group of patients were randomly assigned to treatment and control groups:
Data, where grp2 is an indicator variable defined as a factor type, which is useful for categorical variables.
data_bin2 <- data.frame(N = c(674, 680), y = c(39,22), grp2 = factor(c('control','treatment')))
To analyse whether the treatment is useful, we can use Binomial model for both groups and compute odds-ratio. To recreate the model as two independent (separate) binomial models, we use formula y | trials(N) ~ 0 + grp2, which corresponds to a model \(\mathrm{logit}(\theta) = \alpha \times 0 + \beta_\mathrm{control}\times x_\mathrm{control} + \beta_\mathrm{treatment}\times x_\mathrm{treatment} = \beta_\mathrm{control}\times x_\mathrm{control} + \beta_\mathrm{treatment}\times x_\mathrm{treatment}\), where \(x_\mathrm{control}\) is a vector with 1 for control and 0 for treatment, and \(x_\mathrm{treatemnt}\) is a vector with 1 for treatemnt and 0 for control. As only of the vectors have 1, this corresponds to separate models \(\mathrm{logit}(\theta_\mathrm{control}) = \beta_\mathrm{control}\) and \(\mathrm{logit}(\theta_\mathrm{treatment}) = \beta_\mathrm{treatment}\). We can provide the same prior for all \(\beta\)’s by setting the prior with class='b'. With prior student_t(7, 0,1.5), both \(\beta\)’s are shrunk towards 0, but independently.
fit_bin2 <- brm(y | trials(N) ~ 0 + grp2, family = binomial(), data = data_bin2,
prior = prior(student_t(7, 0,1.5), class='b'),
seed = SEED, refresh = 0)
Check the summary of the posterior and inference diagnostics. brms is using the first factor level control as the baseline and thus reports the coefficient (population-level effect) for treatment (shown s grp2treatment) Check the summary of the posterior and inference diagnostics. With ~ 0 + grp2 there is no Intercept and and are presented as grp2control and grp2treatment.
fit_bin2
Family: binomial
Links: mu = logit
Formula: y | trials(N) ~ 0 + grp2
Data: data_bin2 (Number of observations: 2)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
grp2control -2.77 0.16 -3.10 -2.48 1.00 3563 3085
grp2treatment -3.37 0.22 -3.81 -2.93 1.00 3824 1939
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Compute theta for each group and the odds-ratio. brms uses bariable names b_grp2control and b_grp2treatment for \(\beta_\mathrm{control}\) and \(\beta_\mathrm{treatment}\) respectively.
draws_bin2 <- as_draws_df(fit_bin2) |>
mutate(theta_control = plogis(b_grp2control),
theta_treatment = plogis(b_grp2treatment),
oddsratio = (theta_treatment/(1-theta_treatment))/(theta_control/(1-theta_control)))
Plot oddsratio
mcmc_hist(draws_bin2, pars='oddsratio') +
scale_x_continuous(breaks=seq(0.2,1.6,by=0.2))+
geom_vline(xintercept=1, linetype='dashed')
Probability that the oddsratio<1
draws_bin2 |>
mutate(poddsratio = oddsratio<1) |>
subset(variable='poddsratio') |>
summarise_draws(mean, mcse_mean)
# A tibble: 1 × 3
variable mean mcse_mean
<chr> <dbl> <dbl>
1 poddsratio 0.986 0.00230
oddsratio 95% posterior interval
draws_bin2 |>
subset(variable='oddsratio') |>
summarise_draws(~quantile(.x, probs = c(0.025, 0.975)), ~mcse_quantile(.x, probs = c(0.025, 0.975)))
# A tibble: 1 × 5
variable `2.5%` `97.5%` mcse_q2.5 mcse_q97.5
<chr> <dbl> <dbl> <dbl> <dbl>
1 oddsratio 0.317 0.931 0.00586 0.0134
Make prior sensitivity analysis by powerscaling both prior and likelihood. Focus on oddsratio which is the quantity of interest. We see that the likelihood is much more informative than the prior, and we would expect to see a different posterior only with a highly informative prior (possibly based on previous similar experiments).
oddsratio <- draws_bin2 |>
subset_draws(variable='oddsratio')
powerscale_sensitivity(fit_bin2, prediction = \(x, ...) oddsratio, num_args=list(digits=2)
)$sensitivity |>
filter(variable=='oddsratio') |>
mutate(across(where(is.double), ~num(.x, digits=2)))
# A tibble: 1 × 4
variable prior likelihood diagnosis
<chr> <num:.2!> <num:.2!> <chr>
1 oddsratio 0.01 0.14 -
Above we used formula y | trials(N) ~ 0 + grp2 to have separate model for control and treatment group. An alternative model y | trials(N) ~ grp2 which is equal to y | trials(N) ~ 1 + grp2, would correspond to a model $() = + x = + x. Now \(\alpha\) models the probability of death (via logistic link) in the control group and \(\alpha + \beta_\mathrm{treatment}\) models the probability of death (via logistic link) in the treatment group. Now the models for the groups are connected. Furthermore, if we set independent student_t(7, 0, 1.5) priors on \(\alpha\) and \(\beta_\mathrm{treatment}\), the implied priors on \(\theta_\mathrm{control}\) and \(\theta_\mathrm{treatment}\) are different. We can verify this with a prior simulation.
data.frame(theta_control = plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))) |>
mcmc_hist() +
xlim(c(0,1)) +
labs(title='student_t(7, 0, 1.5) prior on Intercept') +
data.frame(theta_treatment = plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))+
plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))) |>
mcmc_hist() +
xlim(c(0,1)) +
labs(title='student_t(7, 0, 1.5) prior on Intercept and b_grp2treatment')
In this case, with relatively big treatment and control group, the likelihood is informative, and the difference between using y | trials(N) ~ 0 + grp2 or y | trials(N) ~ grp2 is negligible.
Third option would be a hierarchical model with formula y | trials(N) ~ 1 + (1 | grp2), which is equivalent to y | trials(N) ~ 1 + (1 | grp2), and corresponds to a model \(\mathrm{logit}(\theta) = \alpha \times 1 + \beta_\mathrm{control}\times x_\mathrm{control} + \beta_\mathrm{treatment}\times x_\mathrm{treatment}\), but now the prior on \(\beta_\mathrm{control}\) and \(\beta_\mathrm{treatment}\) is \(\mathrm{normal}(0, \sigma_\mathrm{grp})\). The default brms prior for \(\sigma_\mathrm{grp}\) is student_t(3, 0, 2.5). Now \(\alpha\) models the overall probablity of death (via logistic link), and \(\beta_\mathrm{control}\) and \(\beta_\mathrm{treatment}\) model the difference from that having the same prior. Prior for \(\beta_\mathrm{control}\) and \(\beta_\mathrm{treatment}\) includes unknown scale \(\sigma_\mathrm{grp}\). If the there is not difference between control and treatment groups, the posterior of \(\sigma_\mathrm{grp}\) has more mass near 0, and bigger the difference between control and treatment groups are, more mass there is away from 0. With just two groups, there is not much information about \(\sigma_\mathrm{grp}\), and unless there is a informative prior on \(\sigma_\mathrm{grp}\), two group hierarchical model is not that useful. Hierarchical models are more useful with more than two groups. In the following, we use the previously used student_t(7, 0,1.5) prior on intercept and the default brms prior student_t(3, 0, 2.5) on \(\sigma_\mathrm{grp}\).
fit_bin2 <- brm(y | trials(N) ~ 1 + (1 | grp2), family = binomial(), data = data_bin2,
prior = prior(student_t(7, 0,1.5), class='Intercept'),
seed = SEED, refresh = 0, control=list(adapt_delta=0.99))
Check the summary of the posterior and inference diagnostics. The summary reports that there are Group-Level Effects: ~grp2 with 2 levels (control and treatment), with sd(Intercept) denoting \(\sigma_\mathrm{grp}\). In addition, the summary lists Population-Level Effects: Intercept (\(\alpha\)) as in the prevous non-hierarchical models.
fit_bin2
Warning: There were 1 divergent transitions after warmup. Increasing
adapt_delta above 0.99 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
Family: binomial
Links: mu = logit
Formula: y | trials(N) ~ 1 + (1 | grp2)
Data: data_bin2 (Number of observations: 2)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Group-Level Effects:
~grp2 (Number of levels: 2)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.69 1.57 0.15 5.69 1.01 538 1113
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -2.18 1.28 -3.85 1.01 1.01 569 1027
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
We can also look at the variable names brms uses internally
as_draws_rvars(fit_bin2)
# A draws_rvars: 1000 iterations, 4 chains, and 5 variables
$b_Intercept: rvar<1000,4>[1] mean ± sd:
[1] -2.2 ± 1.3
$sd_grp2__Intercept: rvar<1000,4>[1] mean ± sd:
[1] 1.7 ± 1.6
$r_grp2: rvar<1000,4>[2,1] mean ± sd:
Intercept
control -0.63 ± 1.3
treatment -1.19 ± 1.3
$lprior: rvar<1000,4>[1] mean ± sd:
[1] -4.3 ± 0.74
$lp__: rvar<1000,4>[1] mean ± sd:
[1] -13 ± 1.8
Although there is no difference, illustrate how to compute the oddsratio from hierarchical model
draws_bin2 <- as_draws_df(fit_bin2)
oddsratio <- draws_bin2 |>
mutate_variables(theta_control = plogis(b_Intercept + `r_grp2[control,Intercept]`),
theta_treatment = plogis(b_Intercept + `r_grp2[treatment,Intercept]`),
oddsratio = (theta_treatment/(1-theta_treatment))/(theta_control/(1-theta_control))) |>
subset_draws(variable='oddsratio')
oddsratio |> mcmc_hist() +
scale_x_continuous(breaks=seq(0.2,1.6,by=0.2))+
geom_vline(xintercept=1, linetype='dashed')
Make also prior sensitivity analysis with focus on oddsratio.
powerscale_sensitivity(fit_bin2, prediction = \(x, ...) oddsratio, num_args=list(digits=2)
)$sensitivity |>
filter(variable=='oddsratio') |>
mutate(across(where(is.double), ~num(.x, digits=2)))
# A tibble: 1 × 4
variable prior likelihood diagnosis
<chr> <num:.2!> <num:.2!> <chr>
1 oddsratio 0.00 0.16 -
Use the Kilpisjärvi summer month temperatures 1952–2022 data from aaltobda package
load(url('https://github.com/avehtari/BDA_course_Aalto/raw/master/rpackage/data/kilpisjarvi2022.rda'))
data_lin <- data.frame(year = kilpisjarvi2022$year,
temp = kilpisjarvi2022$temp.summer)
Plot the data
data_lin |>
ggplot(aes(year, temp)) +
geom_point(color=2) +
labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
guides(linetype = "none")
To analyse has there been change in the average summer month temperature we use a linear model with Gaussian model for the unexplained variation. By default brms uses uniform prior for the coefficients.
Formula temp ~ year corresponds to model \(\mathrm{temp} ~ \mathrm{normal}(\alpha + \beta \times \mathrm{temp}, \sigma). The model could also be defined as `temp ~ 1 + year` which explicitly shows the intercept (\)$) part. Using the variable names brms uses the model can be written also as temp ~ normal(b_Intercept*1 + b_year*year, sigma). We start with the default priors to see some tricks that brms does behind the curtain.
fit_lin <- brm(temp ~ year, data = data_lin, family = gaussian(),
seed = SEED, refresh = 0)
Check the summary of the posterior and inference diagnostics.
fit_lin
Family: gaussian
Links: mu = identity; sigma = identity
Formula: temp ~ year
Data: data_lin (Number of observations: 71)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -34.69 12.49 -58.73 -10.19 1.00 3995 3035
year 0.02 0.01 0.01 0.03 1.00 3996 3035
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.08 0.09 0.91 1.28 1.00 3057 3011
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Convergence diagnostics look good. We see that posterior mean of Intercept is -34.7, which may sound strange, but that is the intercept at year 0, that is, very far from the data range, and thus doesn’t have meaningful interpretation directly. The posterior mean of year coefficient is 0.02, that is, we estimate that the summer temperature is increasing 0.02°C per year (which would make 1°C in 50 years).
We can check \(R^2\) which corresponds to the proporion of variance explained by the model. The linear model explains 0.16=16% of the total data variance.
bayes_R2(fit_lin) |> round(2)
Estimate Est.Error Q2.5 Q97.5
R2 0.16 0.07 0.03 0.3
We can check the all the priors used.
prior_summary(fit_lin)
prior class coef group resp dpar nlpar lb ub source
(flat) b default
(flat) b year (vectorized)
student_t(3, 9.5, 2.5) Intercept default
student_t(3, 0, 2.5) sigma 0 default
We see that class=b and coef=year have flat, that is, improper uniform prior, Intercept has student_t(3, 9.5, 2.5), and sigma has student_t(3, 0, 2.5) prior. In general it is good to use proper priors, but sometimes flat priors are fine and produce proper posterior (like in this case). Important part here is that by default, brms sets the prior on Intercept after centering the covariate values (design matrix). In this case, brms uses temp - mean(temp) = temp - 1987 instead of original years. This in general improves the sampling efficiency. As the Intercept is now defined at the middle of the data, the default Intercept prior is centered on median of the target (here target is year). If we would like to set informative priors, we need to set the informative prior on Intercept given the centered covariate values. We can turn of the centering by setting argument center=FALSE, and we can set the prior on original intercept by using a formula temp ~ 0 + Intercept + year. In this case, we are happy with the default prior for the intercept. In this specific casse, the flat prior on coefficient is also fine, but we add an weakly informative prior just for the illustration. Let’s assume we expect the temperature to change less than 1°C in 10 years. With student_t(3, 0, 0.03) about 95% prior mass has less than 0.1°C change in year, and with low degrees of freedom (3) we have thick tails making the likelihood dominate in case of prior-data conflict. In real life, we do have much more information about the temperature change, and naturally a hierarchical spatio-temporal model with all temperature measurement locations would be even better.
fit_lin <- brm(temp ~ year, data = data_lin, family = gaussian(),
prior = prior(student_t(3, 0, 0.03), class='b'),
seed = SEED, refresh = 0)
Check the summary of the posterior and inference diagnostics.
fit_lin
Family: gaussian
Links: mu = identity; sigma = identity
Formula: temp ~ year
Data: data_lin (Number of observations: 71)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -32.54 12.28 -56.70 -9.01 1.00 4183 3259
year 0.02 0.01 0.01 0.03 1.00 4182 3259
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.08 0.09 0.92 1.27 1.00 3494 2709
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Make prior sensitivity analysis by powerscaling both prior and likelihood.
powerscale_sensitivity(fit_lin)$sensitivity |>
mutate(across(where(is.double), ~num(.x, digits=2)))
# A tibble: 3 × 4
variable prior likelihood diagnosis
<chr> <num:.2!> <num:.2!> <chr>
1 b_Intercept 0.03 0.09 -
2 b_year 0.03 0.09 -
3 sigma 0.00 0.13 -
Our weakly informative proper prior has negligible sensitivity, and the likelihood is informative. Extract the posterior draws and check the summaries
draws_lin <- as_draws_df(fit_lin)
draws_lin |> summarise_draws()
# A tibble: 5 × 10
variable mean median sd mad q5 q95 rhat ess_bulk
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept -3.25e+1 -3.24e+1 1.23e+1 1.24e+1 -5.29e+1 -1.29e+1 1.00 4183.
2 b_year 2.11e-2 2.11e-2 6.18e-3 6.22e-3 1.12e-2 3.14e-2 1.00 4182.
3 sigma 1.08e+0 1.07e+0 9.14e-2 9.08e-2 9.43e-1 1.24e+0 1.00 3494.
4 lprior -1.08e+0 -1.06e+0 1.65e-1 1.65e-1 -1.38e+0 -8.51e-1 1.00 4173.
5 lp__ -1.07e+2 -1.06e+2 1.21e+0 9.72e-1 -1.09e+2 -1.05e+2 1.00 1899.
# ℹ 1 more variable: ess_tail <dbl>
If one of the columns is hidden we can force printing all columns
draws_lin |> summarise_draws() |> print(width=Inf)
# A tibble: 5 × 10
variable mean median sd mad q5 q95 rhat
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept -32.5 -32.4 12.3 12.4 -52.9 -12.9 1.00
2 b_year 0.0211 0.0211 0.00618 0.00622 0.0112 0.0314 1.00
3 sigma 1.08 1.07 0.0914 0.0908 0.943 1.24 1.00
4 lprior -1.08 -1.06 0.165 0.165 -1.38 -0.851 1.00
5 lp__ -107. -106. 1.21 0.972 -109. -105. 1.00
ess_bulk ess_tail
<dbl> <dbl>
1 4183. 3259.
2 4182. 3259.
3 3494. 2709.
4 4173. 3285.
5 1899. 2576.
Histogram of b_year
draws_lin |>
mcmc_hist(pars='b_year') +
xlab('Average temperature increase per year')
Probability that the coefficient b_year > 0 and the corresponding MCSE
draws_lin |>
mutate(I_b_year_gt_0 = b_year>0) |>
subset_draws(variable='I_b_year_gt_0') |>
summarise_draws(mean, mcse_mean)
# A tibble: 1 × 3
variable mean mcse_mean
<chr> <dbl> <dbl>
1 I_b_year_gt_0 1 NA
All posterior draws have b_year>0, the probability gets rounded to 1, and MCSE is not available as the obserevd posterior variance is 0.
95% posterior interval for temperature increase per 100 years
draws_lin |>
mutate(b_year_100 = b_year*100) |>
subset_draws(variable='b_year_100') |>
summarise_draws(~quantile(.x, probs = c(0.025, 0.975)),
~mcse_quantile(.x, probs = c(0.025, 0.975)),
.num_args = list(digits = 2, notation = "dec"))
# A tibble: 1 × 5
variable `2.5%` `97.5%` mcse_q2.5 mcse_q97.5
<chr> <dbl> <dbl> <dbl> <dbl>
1 b_year_100 0.93 3.33 0.03 0.03
Plot posterior draws of the linear function values at each year. add_linpred_draws() takes the years from the data and uses fit_lin to make the predictions.
data_lin |>
add_linpred_draws(fit_lin) |>
# plot data
ggplot(aes(x=year, y=temp)) +
geom_point(color=2) +
# plot lineribbon for the linear model
stat_lineribbon(aes(y = .linpred), .width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
# decoration
scale_fill_brewer()+
labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
theme(legend.position="none")+
scale_x_continuous(breaks=seq(1950,2020,by=10))
Alternativelly plot a spaghetti plot for 100 draws
data_lin |>
add_linpred_draws(fit_lin, ndraws=100) |>
# plot data
ggplot(aes(x=year, y=temp)) +
geom_point(color=2) +
# plot a line for each posterior draw
geom_line(aes(y=.linpred, group=.draw), alpha = 1/2, color = brewer.pal(5, "Blues")[[3]])+
# decoration
scale_fill_brewer()+
labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
theme(legend.position="none")+
scale_x_continuous(breaks=seq(1950,2020,by=10))
Plot posterior predictive distribution at each year until 2030 add_predicted_draws() takes the years from the data and uses fit_lin to make the predictions.
data_lin |>
add_row(year=2023:2030) |>
add_predicted_draws(fit_lin) |>
# plot data
ggplot(aes(x=year, y=temp)) +
geom_point(color=2) +
# plot lineribbon for the linear model
stat_lineribbon(aes(y = .prediction), .width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
# decoration
scale_fill_brewer()+
labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
theme(legend.position="none")+
scale_x_continuous(breaks=seq(1950,2030,by=10))
Warning: Removed 32000 rows containing missing values (`geom_point()`).
Posterior predictive check with density overlays examines the whole temperature distribution
pp_check(fit_lin, type='dens_overlay', ndraws=20)
LOO-PIT check is good for checking whether the normal distribution is well describing the variation as it is examines the calibration of LOO predictive distributions conditonally on each year. LOO-PIT ploty looks good.
pp_check(fit_lin, type='loo_pit_qq', ndraws=4000)
The temperatures used in the above analyses are averages over three months, which makes it more likely that they are normally distributed, but there can be extreme events in the feather and we can check whether more robust Student’s \(t\) observation model would give different results (although LOO-PIT check did already indicate that the normal would be good).
fit_lin_t <- brm(temp ~ year, data = data_lin, family = student(),
prior = prior(student_t(3, 0, 0.03), class='b'),
seed = SEED, refresh = 0)
Check the summary of the posterior and inference diagnostics. The b_year posterior looks similar as before and the posterior for degrees of freedom nu has most of the posterior mass for quite large values indicating there is no strong support for thick tailed variation in average summer temperatures.
fit_lin_t
Family: student
Links: mu = identity; sigma = identity; nu = identity
Formula: temp ~ year
Data: data_lin (Number of observations: 71)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -34.01 12.27 -58.50 -9.31 1.00 3979 2893
year 0.02 0.01 0.01 0.03 1.00 3979 2923
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.03 0.10 0.86 1.24 1.00 3209 2302
nu 24.54 14.36 6.36 60.80 1.00 2972 2325
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
We can use leave-one-out cross-validation to compare the expected predictive performance.
LOO comparison shows normal and Student’s \(t\) model have similar performance.
loo_compare(loo(fit_lin), loo(fit_lin_t))
elpd_diff se_diff
fit_lin 0.0 0.0
fit_lin_t -0.4 0.3
Heteroskedasticity assumes that the variation around the linear mean can also vary. We can allow sigma to depend on year, too. Although the additional component is written as sigma ~ year, the log link function is used and the model is for log(sigma). bf() allows listing several formulas.
fit_lin_h <- brm(bf(temp ~ year,
sigma ~ year),
data = data_lin, family = gaussian(),
prior = prior(student_t(3, 0, 0.03), class='b'),
seed = SEED, refresh = 0)
Check the summary of the posterior and inference diagnostics. The b_year posterior looks similar as before. The posterior for sigma_year looks like having mosst of the ma for negative values, indicating decrease in temperature variation around the mean.
fit_lin_h
Family: gaussian
Links: mu = identity; sigma = log
Formula: temp ~ year
sigma ~ year
Data: data_lin (Number of observations: 71)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -36.37 12.49 -61.25 -10.49 1.00 3412 2842
sigma_Intercept 19.10 8.69 1.56 35.80 1.00 3818 2899
year 0.02 0.01 0.01 0.04 1.00 3426 2885
sigma_year -0.01 0.00 -0.02 -0.00 1.00 3810 2855
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Histogram of b_year and b_sigma_year
as_draws_df(fit_lin_h) |>
mcmc_areas(pars=c('b_year', 'b_sigma_year'))
As log(x) is almost linear when x is close to zero, we can see that the sigma is decreasing about 1% per year (95% interval from 0% to 2%).
Plot posterior predictive distribution at each year until 2030 add_predicted_draws() takes the years from the data and uses fit_lin_h to make the predictions.
data_lin |>
add_row(year=2023:2030) |>
add_predicted_draws(fit_lin_h) |>
# plot data
ggplot(aes(x=year, y=temp)) +
geom_point(color=2) +
# plot lineribbon for the linear model
stat_lineribbon(aes(y = .prediction), .width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
# decoration
scale_fill_brewer()+
labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
theme(legend.position="none")+
scale_x_continuous(breaks=seq(1950,2030,by=10))
Warning: Removed 32000 rows containing missing values (`geom_point()`).
Make prior sensitivity analysis by powerscaling both prior and likelihood.
powerscale_sensitivity(fit_lin_h)$sensitivity |>
mutate(across(where(is.double), ~num(.x, digits=2)))
# A tibble: 4 × 4
variable prior likelihood diagnosis
<chr> <num:.2!> <num:.2!> <chr>
1 b_Intercept 0.03 0.11 -
2 b_sigma_Intercept 0.00 0.10 -
3 b_year 0.03 0.11 -
4 b_sigma_year 0.00 0.11 -
We can use leave-one-out cross-validation to compare the expected predictive performance.
LOO comparison shows homoskedastic normal and heteroskedastic normal models have similar performances.
loo_compare(loo(fit_lin), loo(fit_lin_h))
elpd_diff se_diff
fit_lin_h 0.0 0.0
fit_lin -1.6 1.6
We can test the linearity assumption by using non-linear spline functions, by uing s(year) terms. Sampling is slower as the posterior gets more complex.
fit_spline_h <- brm(bf(temp ~ s(year),
sigma ~ s(year)),
data = data_lin, family = gaussian(),
seed = SEED, refresh = 0)
We get warnings about divergences, and try rerunning with higher adapt_delta, which leads to using smaller step sizes. Often adapt_delta=0.999 leads to very slow sampling, but with this small data, this is not an issue.
fit_spline_h <- update(fit_spline_h, control = list(adapt_delta=0.999))
Check the summary of the posterior and inference diagnostics. We’re not anymore able to make interpretation of the temperature increase based on this summary. For splines, we see prior scales sds for the spline coefficients.
fit_spline_h
Family: gaussian
Links: mu = identity; sigma = log
Formula: temp ~ s(year)
sigma ~ s(year)
Data: data_lin (Number of observations: 71)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Smooth Terms:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sds(syear_1) 1.07 0.98 0.04 3.61 1.00 1658 1803
sds(sigma_syear_1) 0.94 0.92 0.03 3.40 1.00 1601 1752
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 9.42 0.13 9.18 9.67 1.00 4615 2834
sigma_Intercept 0.04 0.09 -0.12 0.22 1.00 4507 2837
syear_1 2.84 2.88 -3.39 8.85 1.00 1631 1270
sigma_syear_1 -1.05 2.38 -6.41 3.77 1.00 1704 1516
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
We can still plot posterior predictive distribution at each year until 2030 add_predicted_draws() takes the years from the data and uses fit_lin_h to make the predictions.
data_lin |>
add_row(year=2023:2030) |>
add_predicted_draws(fit_spline_h) |>
# plot data
ggplot(aes(x=year, y=temp)) +
geom_point(color=2) +
# plot lineribbon for the linear model
stat_lineribbon(aes(y = .prediction), .width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
# decoration
scale_fill_brewer()+
labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
theme(legend.position="none")+
scale_x_continuous(breaks=seq(1950,2030,by=10))
Warning: Removed 32000 rows containing missing values (`geom_point()`).
And we can use leave-one-out cross-validation to compare the expected predictive performance.
LOO comparison shows homoskedastic normal linear and heteroskedastic normal spline models have similar performances. There are not enough observations to make clear difference between the models.
loo_compare(loo(fit_lin), loo(fit_spline_h))
elpd_diff se_diff
fit_spline_h 0.0 0.0
fit_lin -0.6 1.8
For spline and other non-parametric models, we can use predictive estimates and predictions to get interpretable quantities. Let’s examine the difference of estimated average temperature in years 1952 and 2022.
temp_diff <- posterior_epred(fit_spline_h, newdata=filter(data_lin,year==1952|year==2022)) |>
rvar() |>
diff() |>
as_draws_df() |>
set_variables('temp_diff')
temp_diff <- data_lin |>
filter(year==1952|year==2022) |>
add_epred_draws(fit_spline_h) |>
pivot_wider(id_cols=.draw, names_from = year, values_from = .epred) |>
mutate(temp_diff = `2022`-`1952`,
.chain = (.draw - 1) %/% 1000 + 1,
.iteration = (.draw - 1) %% 1000 + 1) |>
as_draws_df() |>
subset_draws(variable='temp_diff')
Posterior distribution for average summer temperature increase from 1952 to 2022
temp_diff |>
mcmc_hist()
95% posterior interval for average summer temperature increase from 1952 to 2022
temp_diff |>
summarise_draws(~quantile(.x, probs = c(0.025, 0.975)),
~mcse_quantile(.x, probs = c(0.025, 0.975)),
.num_args = list(digits = 2, notation = "dec"))
# A tibble: 1 × 5
variable `2.5%` `97.5%` mcse_q2.5 mcse_q97.5
<chr> <dbl> <dbl> <dbl> <dbl>
1 temp_diff 0.52 2.62 0.04 0.02
Make prior sensitivity analysis by powerscaling both prior and likelihood with focus on average summer temperature increase from 1952 to 2022.
powerscale_sensitivity(fit_spline_h, prediction = \(x, ...) temp_diff, num_args=list(digits=2)
)$sensitivity |>
filter(variable=='temp_diff') |>
mutate(across(where(is.double), ~num(.x, digits=2)))
# A tibble: 1 × 4
variable prior likelihood diagnosis
<chr> <num:.2!> <num:.2!> <chr>
1 temp_diff 0.01 0.08 -
Probability that the average summer temperature has increased from 1952 to 2022 is 99.5%.
temp_diff |>
mutate(I_temp_diff_gt_0 = temp_diff>0,
temp_diff = NULL) |>
subset_draws(variable='I_temp_diff_gt_0') |>
summarise_draws(mean, mcse_mean)
# A tibble: 1 × 3
variable mean mcse_mean
<chr> <dbl> <dbl>
1 I_temp_diff_gt_0 0.997 0.00126
Load factory data, which contain 5 quality measurements for each of 6 machines. We’re interested in analysing are the quality differences between the machines.
factory <- read.table(url('https://raw.githubusercontent.com/avehtari/BDA_course_Aalto/master/rpackage/data-raw/factory.txt'))
colnames(factory) <- 1:6
factory
1 2 3 4 5 6
1 83 117 101 105 79 57
2 92 109 93 119 97 92
3 92 114 92 116 103 104
4 46 104 86 102 79 77
5 67 87 67 116 92 100
We pivot the data to long format
factory <- factory |>
pivot_longer(cols = everything(),
names_to = 'machine',
values_to = 'quality')
factory
# A tibble: 30 × 2
machine quality
<chr> <int>
1 1 83
2 2 117
3 3 101
4 4 105
5 5 79
6 6 57
7 1 92
8 2 109
9 3 93
10 4 119
# ℹ 20 more rows
As comparison make also pooled model
fit_pooled <- brm(quality ~ 1, data = factory, refresh=0)
Check the summary of the posterior and inference diagnostics.
fit_pooled
Family: gaussian
Links: mu = identity; sigma = identity
Formula: quality ~ 1
Data: factory (Number of observations: 30)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 92.94 3.30 86.32 99.32 1.00 2691 1998
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 18.35 2.48 14.39 24.04 1.00 2610 2241
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
As comparison make also seprate model. To make it completely separate we need to have different sigma for each machine, too.
fit_separate <- brm(bf(quality ~ 0 + machine,
sigma ~ 0 + machine),
data = factory, refresh=0)
Check the summary of the posterior and inference diagnostics.
fit_separate
Family: gaussian
Links: mu = identity; sigma = log
Formula: quality ~ 0 + machine
sigma ~ 0 + machine
Data: factory (Number of observations: 30)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
machine1 75.60 11.39 52.88 99.06 1.00 2497 1912
machine2 106.24 7.61 91.38 121.11 1.00 1920 1344
machine3 87.99 8.50 71.84 104.21 1.00 1373 1087
machine4 111.52 4.65 102.24 120.78 1.00 2146 1465
machine5 89.93 6.92 75.98 103.37 1.00 1897 1342
machine6 85.01 13.91 56.11 109.70 1.00 1493 765
sigma_machine1 3.10 0.39 2.45 4.00 1.00 2370 2189
sigma_machine2 2.61 0.41 1.97 3.57 1.00 1825 1143
sigma_machine3 2.69 0.41 2.04 3.60 1.00 1879 1461
sigma_machine4 2.17 0.40 1.52 3.10 1.00 2398 1274
sigma_machine5 2.52 0.40 1.90 3.42 1.00 1908 1501
sigma_machine6 3.12 0.43 2.43 4.11 1.00 1648 805
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
fit_hier <- brm(quality ~ 1 + (1 | machine),
data = factory, refresh = 0)
Check the summary of the posterior and inference diagnostics.
fit_hier
Warning: There were 1 divergent transitions after warmup. Increasing
adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
Family: gaussian
Links: mu = identity; sigma = identity
Formula: quality ~ 1 + (1 | machine)
Data: factory (Number of observations: 30)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Group-Level Effects:
~machine (Number of levels: 6)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 12.82 6.29 2.90 27.59 1.01 865 891
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 92.91 6.24 80.89 106.27 1.01 1125 678
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 15.15 2.36 11.23 20.43 1.00 2023 2306
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
LOO comparison shows the hierarchical model is the best. The differences are small as the number of observations is small and there is a considerable prediction (aleatoric) uncertainty.
loo_compare(loo(fit_pooled), loo(fit_separate), loo(fit_hier))
Warning: Found 3 observations with a pareto_k > 0.7 in model 'fit_separate'. It
is recommended to set 'moment_match = TRUE' in order to perform moment matching
for problematic observations.
elpd_diff se_diff
fit_hier 0.0 0.0
fit_separate -2.9 2.7
fit_pooled -3.8 2.0
Different model posterior distributions for the mean quality. Pooled model ignores the varition between machines. Separate model doesn’t take benefit from the similariy of the machines and has higher uncertainty.
ph <- fit_hier |>
spread_rvars(b_Intercept, r_machine[machine,]) |>
mutate(machine_mean = b_Intercept + r_machine) |>
ggplot(aes(xdist=machine_mean, y=machine)) +
stat_halfeye() +
scale_y_continuous(breaks=1:6) +
labs(x='Quality', y='Machine', title='Hierarchical')
ps <- fit_separate |>
as_draws_df() |>
subset_draws(variable='b_machine', regex=TRUE) |>
set_variables(paste0('b_machine[', 1:6, ']')) |>
as_draws_rvars() |>
spread_rvars(b_machine[machine]) |>
mutate(machine_mean = b_machine) |>
ggplot(aes(xdist=machine_mean, y=machine)) +
stat_halfeye() +
scale_y_continuous(breaks=1:6) +
labs(x='Quality', y='Machine', title='Separate')
pp <- fit_pooled |>
spread_rvars(b_Intercept) |>
mutate(machine_mean = b_Intercept) |>
ggplot(aes(xdist=machine_mean, y=0)) +
stat_halfeye() +
scale_y_continuous(breaks=NULL) +
labs(x='Quality', y='All machines', title='Pooled')
(pp / ps / ph) * xlim(c(50,140))
Warning: Removed 792 rows containing missing values (`geom_slabinterval()`).
Warning: Removed 2 rows containing missing values (`geom_slabinterval()`).
Make prior sensitivity analysis by powerscaling both prior and likelihood with focus on mean quality of each machine. We see no prior sensitivity.
machine_mean <- fit_hier |>
as_draws_df() |>
mutate(across(matches('r_machine'), ~ .x - b_Intercept)) |>
subset_draws(variable='r_machine', regex=TRUE) |>
set_variables(paste0('machine_mean[', 1:6, ']'))
powerscale_sensitivity(fit_hier, prediction = \(x, ...) machine_mean, num_args=list(digits=2)
)$sensitivity |>
filter(str_detect(variable,'machine_mean')) |>
mutate(across(where(is.double), ~num(.x, digits=2)))
# A tibble: 6 × 4
variable prior likelihood diagnosis
<chr> <num:.2!> <num:.2!> <chr>
1 machine_mean[1] 0.06 0.09 prior-data conflict
2 machine_mean[2] 0.06 0.07 prior-data conflict
3 machine_mean[3] 0.05 0.03 weak likelihood
4 machine_mean[4] 0.04 0.10 -
5 machine_mean[5] 0.06 0.03 weak likelihood
6 machine_mean[6] 0.06 0.04 weak likelihood
Sorafenib Toxicity Dataset in metadat R package includes results from 13 studies investigating the occurrence of dose limiting toxicities (DLTs) at different doses of Sorafenib.
Load data
load(url('https://github.com/wviechtb/metadat/raw/master/data/dat.ursino2021.rda'))
head(dat.ursino2021)
study year dose events total
1 Awada 2005 100 0 4
2 Awada 2005 200 0 3
3 Awada 2005 300 1 5
4 Awada 2005 400 1 10
5 Awada 2005 600 7 12
6 Awada 2005 800 1 3
Number of patients per study
dat.ursino2021 |>
group_by(study) |>
summarise(N = sum(total)) |>
ggplot(aes(x=N, y=study)) +
geom_col(fill=4) +
labs(x='Number of patients per study', y='Study')
Distribution of doses
dat.ursino2021 |>
ggplot(aes(x=dose)) +
geom_histogram(breaks=seq(50,1050,by=100), fill=4, colour=1) +
labs(x='Dose (mg)', y='Count') +
scale_x_continuous(breaks=seq(100,1000,by=100))
Each study is using \(2--6\) different dose levels. Three studies that include only two dose levels are likelly to provide weak information on slope.
crosstab <- with(dat.ursino2021,table(dose,study))
data.frame(count=colSums(crosstab), study=colnames(crosstab)) |>
ggplot(aes(x=count, y=study)) +
geom_col(fill=4) +
labs(x='Number of dose levels per study', y='Study')
Pooled model assumes all studies have the same dose effect (reminder: ~ dose is equivalent to ~ 1 + dose)
fit_pooled <- brm(events | trials(total) ~ dose,
prior = c(prior(student_t(7, 0, 1.5), class='Intercept'),
prior(normal(0, 1), class='b')),
family=binomial(), data=dat.ursino2021)
Check the summary of the posterior and inference diagnostics.
fit_pooled
Family: binomial
Links: mu = logit
Formula: events | trials(total) ~ dose
Data: dat.ursino2021 (Number of observations: 49)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -3.18 0.38 -3.96 -2.47 1.00 1091 1571
dose 0.00 0.00 0.00 0.01 1.00 2165 2265
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Dose coefficient seems to be very small. Looking at the posterior, we see that it is positive with high probability.
fit_pooled |>
as_draws() |>
subset_draws(variable='b_dose') |>
summarise_draws(~quantile(.x, probs = c(0.025, 0.975)), ~mcse_quantile(.x, probs = c(0.025, 0.975)))
# A tibble: 1 × 5
variable `2.5%` `97.5%` mcse_q2.5 mcse_q97.5
<chr> <dbl> <dbl> <dbl> <dbl>
1 b_dose 0.00234 0.00521 0.0000264 0.0000404
The dose was reported in mg, and most values are in hundreds. It is often sensible to switch to a scale in which the range of values is closer to unit range. In this case it is natural to use g instead of mg.
dat.ursino2021 <- dat.ursino2021 |>
mutate(doseg = dose/100)
Fit the pooled model again uing doseg
fit_pooled <- brm(events | trials(total) ~ doseg,
prior = c(prior(student_t(7, 0, 1.5), class='Intercept'),
prior(normal(0, 1), class='b')),
family=binomial(), data=dat.ursino2021)
Check the summary of the posterior and inference diagnostics.
fit_pooled
Family: binomial
Links: mu = logit
Formula: events | trials(total) ~ doseg
Data: dat.ursino2021 (Number of observations: 49)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -3.15 0.38 -3.95 -2.45 1.00 2037 2062
doseg 0.37 0.08 0.22 0.51 1.00 2345 2465
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Now it is easier to interpret the presented values. Separate model assumes all studies have different dose effect. It would be a bit complicated to set a different prior on study specific intercepts and other coefficients, so we use the same prior for all.
fit_separate <- brm(events | trials(total) ~ 0 + study + doseg:study,
prior=prior(student_t(7, 0, 1.5), class='b'),
family=binomial(), data=dat.ursino2021)
Check the summary of the posterior and inference diagnostics.
fit_separate
Family: binomial
Links: mu = logit
Formula: events | trials(total) ~ 0 + study + doseg:study
Data: dat.ursino2021 (Number of observations: 49)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
studyAwada -2.68 1.04 -4.89 -0.82 1.00 6560
studyBorthakurMA -2.46 1.80 -6.50 0.38 1.00 5091
studyBorthakurMB -1.66 1.45 -4.68 0.80 1.00 5305
studyChen -0.85 1.57 -4.11 1.97 1.00 7239
studyClark -3.09 1.52 -6.46 -0.71 1.00 5146
studyCrumpMA -1.51 1.20 -4.14 0.61 1.00 6321
studyCrumpMB -1.87 1.15 -4.34 0.11 1.00 6445
studyFuruse -1.68 1.64 -5.46 1.10 1.00 6115
studyMiller -1.10 0.82 -2.74 0.45 1.00 6816
studyMinami -1.98 1.18 -4.57 0.06 1.00 6543
studyMoore -2.32 1.20 -4.99 -0.22 1.00 5701
studyNabors -2.92 1.45 -6.33 -0.55 1.00 6375
studyStrumberg -1.98 0.88 -3.86 -0.38 1.00 6906
studyAwada:doseg 0.38 0.20 0.01 0.80 1.00 6249
studyBorthakurMA:doseg 0.01 0.40 -0.70 0.85 1.00 5338
studyBorthakurMB:doseg 0.06 0.32 -0.53 0.69 1.00 5488
studyChen:doseg -0.64 0.53 -1.72 0.39 1.00 5655
studyClark:doseg 0.44 0.27 -0.01 1.03 1.00 5067
studyCrumpMA:doseg -0.32 0.49 -1.31 0.62 1.00 5557
studyCrumpMB:doseg 0.09 0.28 -0.46 0.63 1.00 6634
studyFuruse:doseg -0.54 0.62 -1.78 0.71 1.00 5738
studyMiller:doseg 0.02 0.29 -0.56 0.57 1.00 6615
studyMinami:doseg -0.16 0.37 -0.91 0.50 1.00 5826
studyMoore:doseg 0.20 0.28 -0.32 0.78 1.00 5900
studyNabors:doseg 0.31 0.20 -0.06 0.77 1.00 6246
studyStrumberg:doseg 0.09 0.17 -0.23 0.42 1.00 6940
Tail_ESS
studyAwada 2752
studyBorthakurMA 2163
studyBorthakurMB 2229
studyChen 2756
studyClark 2223
studyCrumpMA 2650
studyCrumpMB 2636
studyFuruse 2610
studyMiller 2763
studyMinami 2821
studyMoore 2384
studyNabors 2213
studyStrumberg 2522
studyAwada:doseg 3035
studyBorthakurMA:doseg 2291
studyBorthakurMB:doseg 2289
studyChen:doseg 2368
studyClark:doseg 2215
studyCrumpMA:doseg 2942
studyCrumpMB:doseg 2474
studyFuruse:doseg 2228
studyMiller:doseg 2320
studyMinami:doseg 2943
studyMoore:doseg 2632
studyNabors:doseg 2311
studyStrumberg:doseg 2621
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
We build two different hierarchical models. The first one has hierarchical model for the intercept, that is, each study has a parameter telling how much that study differs from the common population intercept.
fit_hier1 <- brm(events | trials(total) ~ doseg + (1 | study),
prior=c(prior(student_t(7, 0, 1.5), class='Intercept'),
prior(normal(0, 1), class='b')),
family=binomial(), data=dat.ursino2021)
The second hierarchical model assumes that also the slope can vary between the studies.
fit_hier2 <- brm(events | trials(total) ~ doseg + (doseg | study),
prior=c(prior(student_t(7, 0, 1.5), class='Intercept'),
prior(normal(0, 1), class='b')),
family=binomial(), data=dat.ursino2021)
We seem some divergences due to highly varying posterior curvature. We repeat the sampling with higher adapt_delta, which adjust the step size to be smaller. Higher adapt_delta makes the computation slower, but that is not an issue in this case. If you get divergences with adapt_delta=0.99, it is likely that even larger values don’t help, and you need to consider different parameterisation, different model, or more informative priors.
fit_hier2 <- update(fit_hier2, control=list(adapt_delta=0.99))
LOO-CV comparison
loo_compare(loo(fit_pooled), loo(fit_separate), loo(fit_hier1), loo(fit_hier2))
Warning: Found 13 observations with a pareto_k > 0.7 in model 'fit_separate'.
It is recommended to set 'moment_match = TRUE' in order to perform moment
matching for problematic observations.
Warning: Found 2 observations with a pareto_k > 0.7 in model 'fit_hier1'. It is
recommended to set 'moment_match = TRUE' in order to perform moment matching
for problematic observations.
elpd_diff se_diff
fit_hier1 0.0 0.0
fit_hier2 -0.6 0.5
fit_pooled -1.9 2.6
fit_separate -25.1 5.7
We get warnings about several Pareto k’s > 0.7 in PSIS-LOO for separate model, but as in that case the LOO-CV estimate is usually overoptimistic and the separate model is the worst, there is no need to use more accurate computation for the separate model.
We get warnings about a few Pareto k’s > 0.7 in PSIS-LOO for both hierarchical models. We can improve the accuracy be running MCMC for these LOO folds. We use add_criterion() function to store the LOO computation results as they take a bit longer now. We get some divergences in case of the second hierarchical model, as leaving out an observation for a study that has only two dose levels is making the posterior having a difficult shape.
fit_hier1 <- add_criterion(fit_hier1, criterion='loo', reloo=TRUE)
fit_hier2 <- add_criterion(fit_hier2, criterion='loo', reloo=TRUE)
We repeat the LOO-CV comparison (without separate model). loo() function is useing the reults added to the fit objects.
loo_compare(loo(fit_pooled), loo(fit_hier1), loo(fit_hier2))
elpd_diff se_diff
fit_hier1 0.0 0.0
fit_hier2 -0.7 0.5
fit_pooled -2.0 2.7
The results did not change much. The first hierarchical model is slightly better than other models, but for predictive purposes there is not much difference (there is high aleatoric uncertainty in the predictions). Adding hiearchical model for the slope, decrased the predictive performance and thus it is likely that there is not enough information about the variation in slopes between studies.
Posterior predictive checking showing the observed and predicted number of events. Rootgram uses square root of counts on y-axis for better scaling. Rootogram is useful for count data when the range of counts is small or moderate.
pp_check(fit_pooled, type = "rootogram") +
labs(title='Pooled model')
pp_check(fit_hier1, type = "rootogram") +
labs(title='Hierarchical model')
pp_check(fit_hier2, type = "rootogram") +
labs(title='Hierarchical model')
We see that the hierarchical models have higher probability for future counts that are bigger than maximum observed count and longer predictive distribution tail. This is natural as uncertainty in the variation between tudies increases predictive uncertainty, too, especially as the number of studies is relatively small.
The population level coefficient posterior given pooled model
plot_posterior_pooled <- mcmc_areas(as_draws_df(fit_pooled), regex_pars='b_doseg') +
geom_vline(xintercept=0, linetype='dashed') +
labs(title='Pooled model')
The population level coefficient posterior given hierarchical model 1
plot_posterior_hier1 <- mcmc_areas(as_draws_df(fit_hier1), regex_pars='b_doseg') +
geom_vline(xintercept=0, linetype='dashed') +
labs(title='Hierarchical model 1')
The population level coefficient posterior given hierarchical model 3
plot_posterior_hier2 <- mcmc_areas(as_draws_df(fit_hier2), regex_pars='b_doseg') +
geom_vline(xintercept=0, linetype='dashed') +
labs(title='Hierarchical model 2')
(plot_posterior_pooled / plot_posterior_hier1 / plot_posterior_hier2) * xlim(c(0,0.85))
Warning: Removed 1 rows containing missing values (`geom_segment()`).
All models agree that the slope is very likely positive. The hierarchical models have more uncertainty, but also higher posterior mean.
When we look at the study specific parameters, we see that the Miller study has slightly higher intercept (leading to higher theta).
(mcmc_areas(as_draws_df(fit_hier1), regex_pars='r_study\\[.*Intercept') +
labs(title='Hierarchical model 1')) +
(mcmc_areas(as_draws_df(fit_hier2), regex_pars='r_study\\[.*Intercept') +
labs(title='Hierarchical model 2'))
There are no clear differences in slopes.
mcmc_areas(as_draws_df(fit_hier2), regex_pars='r_study\\[.*doseg') +
labs(title='Hierarchical model 2')
Based on LOO comparison we could continue with any of the models, but if we want to take into account the unknown possible study variations, it is best to continue with the hierarchical model 2. We could reduce the uncertainty by spending some effort to elicit a more informative priors for the between study variation, by searching open study databses for similar studies. In this example, we skip that and continue with other parts of the workflow.
Make prior sensitivity analysis by powerscaling both prior and likelihood for hierarchical model focusing on the common population level intercept.
powerscale_sensitivity(fit_hier2, variable='b_doseg'
)$sensitivity |>
mutate(across(where(is.double), ~num(.x, digits=2)))
# A tibble: 1 × 4
variable prior likelihood diagnosis
<chr> <num:.2!> <num:.2!> <chr>
1 b_doseg 0.03 0.10 -
The posterior for the probability of event given certain dose and a new study for hierarchical model 2.
data.frame(study='new',
doseg=seq(0.1,1,by=0.1),
total=1) |>
add_linpred_draws(fit_hier2, transform=TRUE, allow_new_levels=TRUE) |>
ggplot(aes(x=doseg, y=.linpred)) +
stat_lineribbon(.width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
scale_fill_brewer()+
labs(x= "Dose (g)", y = 'Probability of event', title='Hierarchical model') +
theme(legend.position="none") +
geom_hline(yintercept=0) +
scale_x_continuous(breaks=seq(0.1,1,by=0.1)) +
ylim(c(0,0.15))
Warning: Removed 734 rows containing missing values (`stat_slabinterval()`).
If we plot individual posterior draws, we see that there is a lot of uncertainty about the overall probability (explained by the variation in Intercept in different studies), but less uncertainty about the slope.
data.frame(study='new',
doseg=seq(0.1,1,by=0.1),
total=1) |>
add_linpred_draws(fit_hier2, transform=TRUE, allow_new_levels=TRUE, ndraws=100) |>
ggplot(aes(x=doseg, y=.linpred)) +
geom_line(aes(group=.draw), alpha = 1/2, color = brewer.pal(5, "Blues")[[3]])+
scale_fill_brewer()+
labs(x= "Dose (g)", y = 'Probability of event') +
theme(legend.position="none") +
geom_hline(yintercept=0) +
scale_x_continuous(breaks=seq(0.1,1,by=0.1))
Studies on Pharmacologic Treatments for Chronic Obstructive Pulmonary Disease includes results from 39 trials examining pharmacologic treatments for chronic obstructive pulmonary disease (COPD).
Load data
load(url('https://github.com/wviechtb/metadat/raw/master/data/dat.baker2009.rda'))
# force character strings to factors for easier ploting
dat.baker2009 <- dat.baker2009 |>
mutate(study = factor(study),
treatment = factor(treatment),
id = factor(id))
Look at six first lines of the data frame
head(dat.baker2009)
study year id treatment exac total
1 Llewellyn-Jones 1996 1996 1 Fluticasone 0 8
2 Llewellyn-Jones 1996 1996 1 Placebo 3 8
3 Boyd 1997 1997 2 Salmeterol 47 229
4 Boyd 1997 1997 2 Placebo 59 227
5 Paggiaro 1998 1998 3 Fluticasone 45 142
6 Paggiaro 1998 1998 3 Placebo 51 139
Total number of patients in each study varies a lot
dat.baker2009 |>
group_by(study) |>
summarise(N = sum(total)) |>
ggplot(aes(x=N, y=study)) +
geom_col(fill=4) +
labs(x='Number of patients per study', y='Study')
None of the treatments is included in every study, and each study includes \(2--4\) treatments.
crosstab <- with(dat.baker2009,table(study, treatment))
#
plot_treatments <- data.frame(number_of_studies=colSums(crosstab), treatment=colnames(crosstab)) |>
ggplot(aes(x=number_of_studies,y=treatment)) +
geom_col(fill=4) +
labs(x='Number of studies with a treatment X', y='Treatment') +
geom_vline(xintercept=nrow(crosstab), linetype='dashed') +
scale_x_continuous(breaks=c(0,10,20,30,39))
#
plot_studies <- data.frame(number_of_treatments=rowSums(crosstab), study=rownames(crosstab)) |>
ggplot(aes(x=number_of_treatments,y=study)) +
geom_col(fill=4) +
labs(x='Number of treatments in a study Y', y='Study') +
geom_vline(xintercept=ncol(crosstab), linetype='dashed') +
scale_x_continuous(breaks=c(0,2,4,6,8))
#
plot_treatments + plot_studies
The first model is pooling the information over studies, but estimating separate theta for each treatment (including placebo).
fit_pooled <- brm(exac | trials(total) ~ 0 + treatment,
prior = prior(student_t(7, 0, 1.5), class='b'),
family=binomial(), data=dat.baker2009)
Check the summary of the posterior and inference diagnostics.
fit_pooled
Family: binomial
Links: mu = logit
Formula: exac | trials(total) ~ 0 + treatment
Data: dat.baker2009 (Number of observations: 94)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat
treatmentBudesonide -0.31 0.09 -0.49 -0.12 1.00
treatmentBudesonidePFormoterol -0.49 0.09 -0.68 -0.31 1.00
treatmentFluticasone 0.35 0.04 0.28 0.43 1.00
treatmentFluticasonePSalmeterol 0.12 0.03 0.06 0.19 1.00
treatmentFormoterol -0.71 0.06 -0.84 -0.59 1.00
treatmentPlacebo -0.28 0.02 -0.32 -0.24 1.00
treatmentSalmeterol -0.38 0.03 -0.44 -0.33 1.00
treatmentTiotropium -0.90 0.03 -0.96 -0.84 1.00
Bulk_ESS Tail_ESS
treatmentBudesonide 6178 2969
treatmentBudesonidePFormoterol 6562 3263
treatmentFluticasone 6524 2696
treatmentFluticasonePSalmeterol 7526 3198
treatmentFormoterol 7062 2939
treatmentPlacebo 6714 2822
treatmentSalmeterol 6200 2592
treatmentTiotropium 7609 3139
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Treatment effect posteriors
fit_pooled |>
as_draws_df() |>
subset_draws(variable='b_', regex=TRUE) |>
set_variables(paste0('b_treatment[', levels(factor(dat.baker2009$treatment)), ']')) |>
as_draws_rvars() |>
spread_rvars(b_treatment[treatment]) |>
mutate(theta_treatment = rfun(plogis)(b_treatment)) |>
ggplot(aes(xdist=theta_treatment, y=treatment)) +
stat_halfeye() +
labs(x='theta', y='Treatment', title='Pooled over studies, separate over treatments')
Treatment effect odds-ratio posteriors
theta <- fit_pooled |>
as_draws_df() |>
subset_draws(variable='b_', regex=TRUE) |>
set_variables(paste0('b_treatment[', levels(factor(dat.baker2009$treatment)), ']')) |>
as_draws_rvars() |>
spread_rvars(b_treatment[treatment]) |>
mutate(theta_treatment = rfun(plogis)(b_treatment))
theta_placebo <- filter(theta,treatment=='Placebo')$theta_treatment[[1]]
theta |>
mutate(treatment_oddsratio = (theta_treatment/(1-theta_treatment))/(theta_placebo/(1-theta_placebo))) |>
filter(treatment != "Placebo") |>
ggplot(aes(xdist=treatment_oddsratio, y=treatment)) +
stat_halfeye() +
labs(x='Odds-ratio', y='Treatment', title='Pooled over studies, separate over treatments') +
geom_vline(xintercept=1, linetype='dashed')
We see a big variation between treatments and two treatments seem to be harmful, which is suspicious. Looking at the data we see that not all studies included all treatments, and thus if some of the studies had more events, then the above estimates can be wrong.
The target is discrete count, but as the range of counts is big, a rootogram would look messy, and density overlay plot is a better choice. Posterior predictive checking with kernel density estimates for the data and 10 posterior predictive replicates shows clear discrepancy.
pp_check(fit_pooled, type='dens_overlay')
Posterior predictive checking with PIT values and ECDF difference plot with envelope shows clear discrepancy.
pp_check(fit_pooled, type='pit_ecdf', ndraws=4000)
Posterior predictive checking with LOO-PIT values show clear discrepancy.
pp_check(fit_pooled, type='loo_pit_qq', ndraws=4000) +
geom_abline() +
ylim(c(0,1))
Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
Warning: Removed 10 rows containing missing values (`geom_point()`).
Warning: Removed 2 rows containing missing values (`geom_path()`).
The second model uses a hiearchical model both for treatment effects and study effects.
fit_hier <- brm(exac | trials(total) ~ (1 | treatment) + (1 | study),
family=binomial(), data=dat.baker2009)
Check the summary of the posterior and inference diagnostics.
fit_hier
Family: binomial
Links: mu = logit
Formula: exac | trials(total) ~ (1 | treatment) + (1 | study)
Data: dat.baker2009 (Number of observations: 94)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Group-Level Effects:
~study (Number of levels: 39)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.21 0.17 0.92 1.57 1.00 533 987
~treatment (Number of levels: 8)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.17 0.07 0.08 0.35 1.00 1079 1564
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.89 0.21 -1.30 -0.49 1.02 458 755
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
LOO-CV comparison
loo_compare(loo(fit_pooled), loo(fit_hier))
Warning: Found 25 observations with a pareto_k > 0.7 in model 'fit_pooled'. It
is recommended to set 'moment_match = TRUE' in order to perform moment matching
for problematic observations.
Warning: Found 24 observations with a pareto_k > 0.7 in model 'fit_hier'. It is
recommended to set 'moment_match = TRUE' in order to perform moment matching
for problematic observations.
elpd_diff se_diff
fit_hier 0.0 0.0
fit_pooled -1945.2 299.6
We get warnings about Pareto k’s > 0.7 in PSIS-LOO, but as the difference between the models is huge, we can be confident that the order would the same if we fixed the computation, and the hierarchical model is much better and there is high variation between studies. Clearly there are many highly influential observations.
Posterior predictive checking with kernel density estimates for the data and 10 posterior predictive replicates looks good (although with this many parameters, this check is likely to be optimistic).
pp_check(fit_hier, type='dens_overlay')
Posterior predictive checking with PIT values and ECDF difference plot with envelope looks good (although with this many parameters, this check is likely to be optimistic).
pp_check(fit_hier, type='pit_ecdf', ndraws=4000)
Posterior predictive checking with LOO-PIT values look good (alhough as there are Pareto-khat warnings, it is possible that this diagnostic is optimistic).
pp_check(fit_hier, type='loo_pit_qq', ndraws=4000) +
geom_abline() +
ylim(c(0,1))
Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
Warning: Removed 2 rows containing missing values (`geom_point()`).
Warning: Removed 2 rows containing missing values (`geom_path()`).
Treatment effect posteriors have now much less variation.
fit_hier |>
spread_rvars(b_Intercept, r_treatment[treatment,]) |>
mutate(theta_treatment = rfun(plogis)(b_Intercept + r_treatment)) |>
ggplot(aes(xdist=theta_treatment, y=treatment)) +
stat_halfeye() +
labs(x='theta', y='Treatment', title='Hierarchical over studies, hierarchical over treatments')
Study effect posteriors show the expected high variation.
fit_hier |>
spread_rvars(b_Intercept, r_study[study,]) |>
mutate(theta_study = rfun(plogis)(b_Intercept + r_study)) |>
ggplot(aes(xdist=theta_study, y=study)) +
stat_halfeye() +
labs(x='theta', y='Study', title='Hierarchical over studies, hierarchical over treatments')
Treatment effect odds-ratio posteriors
theta <- fit_hier |>
spread_rvars(b_Intercept, r_treatment[treatment,]) |>
mutate(theta_treatment = rfun(plogis)(b_Intercept + r_treatment))
theta_placebo <- filter(theta,treatment=='Placebo')$theta_treatment[[1]]
theta |>
mutate(treatment_oddsratio = (theta_treatment/(1-theta_treatment))/(theta_placebo/(1-theta_placebo))) |>
filter(treatment != "Placebo") |>
ggplot(aes(xdist=treatment_oddsratio, y=treatment)) +
stat_halfeye() +
labs(x='Odds-ratio', y='Treatment', title='Hierarchical over studies, hierarchical over treatments') +
geom_vline(xintercept=1, linetype='dashed')
Treatment effect odds-ratios look now more reasonable. As now all treatments were compared to placebo, there is less overlap in the distributions as when looking at the thetas, as all thetas include similar uncertainty about the overall theta due to high variation between studies. The third model includes interaction so that the treatment can depend on study.
fit_hier2 <- brm(exac | trials(total) ~ (1 | treatment) + (treatment | study),
family=binomial(), data=dat.baker2009, control=list(adapt_delta=0.9))
LOO comparison shows
loo_compare(loo(fit_hier), loo(fit_hier2))
Warning: Found 24 observations with a pareto_k > 0.7 in model 'fit_hier'. It is
recommended to set 'moment_match = TRUE' in order to perform moment matching
for problematic observations.
Warning: Found 46 observations with a pareto_k > 0.7 in model 'fit_hier2'. It
is recommended to set 'moment_match = TRUE' in order to perform moment matching
for problematic observations.
elpd_diff se_diff
fit_hier2 0.0 0.0
fit_hier -1.1 3.1
We get warnings about Pareto k’s > 0.7 in PSIS-LOO, but as the models are similar, and the difference is small, we can be relatively confident that the more complex model is not better.